Goto

Collaborating Authors

 asynchronous sgd




Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning

Neural Information Processing Systems

We study the asynchronous stochastic gradient descent algorithm, for distributed training over $n$ workers that might be heterogeneous. In this algorithm, workers compute stochastic gradients in parallel at their own pace and return them to the server without any synchronization.Existing convergence rates of this algorithm for non-convex smooth objectives depend on the maximum delay $\tau_{\max}$ and reach an $\epsilon$-stationary point after $O\!\left(\sigma^2\epsilon^{-2}+ \tau_{\max}\epsilon^{-1}\right)$ iterations, where $\sigma$ is the variance of stochastic gradients.


Shadowheart SGD: Distributed Asynchronous SGD with Optimal Time Complexity Under Arbitrary Computation and Communication Heterogeneity

Neural Information Processing Systems

We consider nonconvex stochastic optimization problems in the asynchronous centralized distributed setup where the communication times from workers to a server can not be ignored, and the computation and communication times are potentially different for all workers.





Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays

Neural Information Processing Systems

"virtual iterates" and delay-adaptive stepsizes, which allow us to derive state-of-the-art guarantees for both convex and non-convex objectives.



Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning

Neural Information Processing Systems

We study the asynchronous stochastic gradient descent algorithm, for distributed training over n workers that might be heterogeneous. In this algorithm, workers compute stochastic gradients in parallel at their own pace and return them to the server without any synchronization.Existing convergence rates of this algorithm for non-convex smooth objectives depend on the maximum delay \tau_{\max} and reach an \epsilon -stationary point after O\!\left(\sigma 2\epsilon {-2} \tau_{\max}\epsilon {-1}\right) iterations, where \sigma is the variance of stochastic gradients. We also provide (ii) a simple delay-adaptive learning rate scheme, under which asynchronous SGD achieves a convergence rate of O\!\left(\sigma 2\epsilon {-2} \tau_{avg}\epsilon {-1}\right), and does not require any extra hyperparameter tuning nor extra communications. In addition, (iii) we consider the case of heterogeneous functions motivated by federated learning applications and improve the convergence rate by proving a weaker dependence on the maximum delay compared to prior works.